Methods for capturing spectro-temporal modulations in automatic speech recognition

نویسنده

Michael Kleinschmidt

چکیده

Psychoacoustical and neurophysiological results indicate that spectro-temporal modulations play an important role in sound perception. Speech signals, in particular, exhibit distinct spectro-temporal patterns which are well matched by receptive fields of cortical neurons. In order to improve the performance of automatic speech recognition (ASR) systems a number of different approaches are presented, all of which target at capturing spectro-temporal modulations. By deriving secondary features from the output of a perception model the tuning of neurons towards different envelope fluctuations is modeled. The following types of secondary features are introduced: product of two or more windows (sigma-pi cells) of variable size in the spectro-temporal representation, fuzzy-logical combination of windows and a Gabor function to model the shape of receptive fields of cortical neurons. The different approaches are tested on a simple isolated word recognition task and compared to a standard Hidden Markov Model recognition system. The results show that all types of secondary features are suitable for ASR. Gabor secondary features, in particular, yield a robust performance in additive noise, which is comparable and in some conditions superior to the Aurora 2 reference system.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Spectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition

متن کامل

Spectro-temporal directional derivative features for automatic speech recognition

We introduce a novel spectro-temporal representation of speech by applying directional derivative filters to the Melspectrogram, with the aim of improving the robustness of automatic speech recognition. Previous studies have shown that two-dimensional wavelet functions, when tuned to appropriate spectral scales and temporal rates, are able to accurately capture the acoustic modulations of speec...

متن کامل

Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank

This paper analyzes the application of methods developed in automatic speech recognition (ASR) to better understand neural activity measured with electrocorticography (ECoG) during the presentation of speech. ECoG data is collected from temporal cortex in two subjects listening to a matrix sentence test. We investigate the relation of ECoG signals and acoustic speech that has been processed wit...

متن کامل

Spectro-temporal modulations for robust speech emotion recognition

Speech emotion recognition is mostly considered in clean speech. In this paper, joint spectro-temporal features (RS features) are extracted from an auditory model and are applied to detect the emotion status of noisy speech. The noisy speech is derived from the Berlin Emotional Speech database with added white and babble noises under various SNR levels. The clean train/noisy test scenario is in...

متن کامل

Session 2pSCa: Speech Communication 2pSCa2. Improving automatic speech recognition by learning from human errors

This work presents a series of experiments that compare the performance of human speech recognition (HSR) and automatic speech recognition (ASR). The goal of this line of research is to learn from the differences between HSR and ASR, and to use this knowledge to incorporate new signal processing strategies from the human auditory system in automatic classifiers. A database with noisy nonsense u...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2001

Methods for capturing spectro-temporal modulations in automatic speech recognition

نویسنده

چکیده

منابع مشابه

Spectro-temporal Modulations for Robust Speech Emotion Recognition Spectro-temporal Modulations for Robust Speech Emotion Recognition

Spectro-temporal directional derivative features for automatic speech recognition

Neural Responses to Speech-Specific Modulations Derived from a Spectro-Temporal Filter Bank

Spectro-temporal modulations for robust speech emotion recognition

Session 2pSCa: Speech Communication 2pSCa2. Improving automatic speech recognition by learning from human errors

عنوان ژورنال:

اشتراک گذاری